Search results for "File format"
showing 10 items of 22 documents
FASTdoop: A versatile and efficient library for the input of FASTA and FASTQ files for MapReduce Hadoop bioinformatics applications
2017
Abstract Summary MapReduce Hadoop bioinformatics applications require the availability of special-purpose routines to manage the input of sequence files. Unfortunately, the Hadoop framework does not provide any built-in support for the most popular sequence file formats like FASTA or BAM. Moreover, the development of these routines is not easy, both because of the diversity of these formats and the need for managing efficiently sequence datasets that may count up to billions of characters. We present FASTdoop, a generic Hadoop library for the management of FASTA and FASTQ files. We show that, with respect to analogous input management routines that have appeared in the Literature, it offers…
The Effects of Static and Dynamic Visual Representations as Aids for Primary School Children in Tasks of Auditory Discrimination of Sound Patterns. A…
2018
It has been proposed that non-conventional presentations of visual information could be very useful as a scaffolding strategy in the learning of Western music notation. As a result, this study has attempted to determine if there is any effect of static and dynamic presentation modes of visual information in the recognition of sound patterns. An intervention-based quasi-experimental design was adopted with two groups of fifth-grade students in a Spanish city. Students did tasks involving discrimination, auditory recognition and symbolic association of the sound patterns with non-musical representations, either static images (S group), or dynamic images (D group). The results showed neither s…
The Human Proteome Organization–Proteomics Standards Initiative Quality Control Working Group: Making quality control more accessible for biological …
2017
To have confidence in results acquired during biological mass spectrometry experiments, a systematic approach to quality control is of vital importance. Nonetheless, until now, only scattered initiatives have been undertaken to this end, and these individual efforts have often not been complementary. To address this issue, the Human Proteome Organization–Proteomics Standards Initiative has established a new working group on quality control at its meeting in the spring of 2016. The goal of this working group is to provide a unifying framework for quality control data. The initial focus will be on providing a community-driven standardized file format for quality control. For this purpose, the…
CoverageAnalyzer (CAn): A Tool for Inspection of Modification Signatures in RNA Sequencing Profiles
2016
Combination of reverse transcription (RT) and deep sequencing has emerged as a powerful instrument for the detection of RNA modifications, a field that has seen a recent surge in activity because of its importance in gene regulation. Recent studies yielded high-resolution RT signatures of modified ribonucleotides relying on both sequence-dependent mismatch patterns and reverse transcription arrests. Common alignment viewers lack specialized functionality, such as filtering, tailored visualization, image export and differential analysis. Consequently, the community will profit from a platform seamlessly connecting detailed visual inspection of RT signatures and automated screening for modifi…
Main Steps in Image Processing and Quantification: The Analysis Workflow
2019
In the last decades, the variety of programs, algorithms, and strategies that researchers have at their disposal to process and analyze image files has grown extensively. However, these are only pointless tools if not applied with the careful planning required to achieve a succesful image analysis. In order to do so, the analyst must establish a meaningful and effective sequence of orderly operations that is able to (1) overcome all the problems derived from the image manipulation and (2) successfully resolve the question that was originally posed. In this chapter, the authors suggest a set of strategies and present a reflection on the main milestones that compose the image processing workf…
A comparison of HDFS compact data formats: Avro versus Parquet
2017
In this paper, file formats like Avro and Parquet are compared with text formats to evaluate the performance of the data queries. Different data query patterns have been evaluated. Cloudera’s open-source Apache Hadoop distribution CDH 5.4 has been chosen for the experiments presented in this article. The results show that compact data formats (Avro and Parquet) take up less storage space when compared with plain text data formats because of binary data format and compression advantage. Furthermore, data queries from the column based data format Parquet are faster when compared with text data formats and Avro. Article in English. HDFS glaustųjų duomenų formatų palyginimas: Avro prieš Parquet…
A sensor-data-based denoising framework for hyperspectral images
2015
Many denoising approaches extend image processing to a hyperspectral cube structure, but do not take into account a sensor model nor the format of the recording. We propose a denoising framework for hyperspectral images that uses sensor data to convert an acquisition to a representation facilitating the noise-estimation, namely the photon-corrected image. This photon corrected image format accounts for the most common noise contributions and is spatially proportional to spectral radiance values. The subsequent denoising is based on an extended variational denoising model, which is suited for a Poisson distributed noise. A spatially and spectrally adaptive total variation regularisation term…
Software for simulating dichromatic perception of video streams
2013
We have designed a configurable stand-alone Matlab-based software to simulate dichromatic perception of video streams. The algorithm used is an extension for video streams of the “corresponding pair algorithm” by Capilla and coworkers for simulation of dichromatic perception of images. The software allows the user to upload a video sequence and to process it using different dichromatic color vision models and viewing conditions. The output video may be generated in different spatial and temporal resolutions and file formats. The functions for Matlab environment and a stand-alone application may be downloaded from the Repository of the University of Alicante. © 2013 Wiley Periodicals, Inc. C…
tbg - a new file format for genomic data
2021
AbstractMotivationThe question of determining whether a Single-Nucleotide Polymorphism (SNP) or a variant in general leads to a change in the amino acid sequence of a protein coding gene is often a laborious and time-consuming challenge. Here, we introduce the tbg file format for storing genomic data and tbg-tools, a user-friendly toolbox for the faster analysis of SNPs. The file format stores information for each nucleotide in each gene, allowing to predict which change in the amino acid sequence will be caused by a variant in the nucleotide sequence. Our new tool therefore has the potential to make biological sense of the unprecedented amount of genome-wide genetic variation that research…
Three-domain image representation for personal photo album management
2010
In this paper we present a novel approach for personal photo album management. Pictures are analyzed and described in three representation spaces, namely, faces, background and time of capture. Faces are automatically detected and rectified using a probabilistic feature extraction technique. Face representation is then produced by computing PCA (Principal Component Analysis). Backgrounds are represented with low-level visual features based on RGB histogram and Gabor filter bank. Temporal data is obtained through the extraction of EXIF (Exchangeable image file format) data. Each image in the collection is then automatically organized using a mean-shift clustering technique. While many system…